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A random- walk Metropolis sampler is geometrically ergodic if its 
equilibrium density is super-exponentially light and satisfies a curva- 
ture condition [Stochastic Process. Appl. 85 (2000) 341-361]. Many 
applications, including Bayesian analysis with conjugate priors of lo- 
gistic and Poisson regression and of log-linear models for categorical 
data result in posterior distributions that are not super-exponentially 
light. We show how to apply the change-of- variable formula for dif- 
feomorphisms to obtain new densities that do satisfy the conditions 
for geometric ergodicity. Sampling the new variable and mapping the 
results back to the old gives a geometrically ergodic sampler for the 
original variable. This method of obtaining geometric ergodicity has 
very wide applicability. 

1. Introduction. Markov chain Monte Carlo (MCMC) using the Metro- 
polis-Hastings-Green algorithm [Metropolis et al. (1953), Hastings (1970), 
Green (1995)] or its special case the Gibbs sampler [Geman and Geman 
(1984), Tanner and Wong (1987), Gelfand and Smith (1990)] has become 
very widely used [Gilks, Richardson and Spiegelhalter (1996), Brooks et al. 
(2011)], especially after Gelfand and Smith (1990) pointed out that most 
Bayesian inference can be done using MCMC, and little can be done without 
it. 

In ordinary, independent and identically distributed Monte Carlo (OMC), 
the asymptotic variance of estimates is easily calculated [Geyer (2011), Sec- 
tion 1.7]. In MCMC, the properties of estimates are more difficult to handle 
theoretically [Geyer (2011), Section 1.8]. A Markov chain central limit theo- 
rem (CLT) may or may not hold [Tierney (1994), Chan and Geyer (1994)]. 
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If it does hold, the asymptotic variance of MCMC estimates is more dif- 
ficult to estimate than for OMC estimates, but estimating the asymptotic 
variance of the MCMC estimates is doable [Geyer (1992), Flegal and Jones 
(2010), Geyer (2011), Section 1.10]. The CLT holds for all L 2+£ functionals 
of a Markov chain if the Markov chain is geometrically ergodic [Chan and 
Geyer (1994)]. For a reversible Markov chain [Geyer (2011), Section 1.5] 
the CLT holds for all L 2 functionals if and only if the Markov chain is 
geometrically ergodic [Roberts and Rosenthal (1997)]. The CLT may hold 
for some functionals of a Markov chain when the Markov chain is not ge- 
ometrically ergodic [Gordin and Lifsic (1978), Maigret (1978), Kipnis and 
Varadhan (1986), Chan (1993), Tierney (1994), Chan and Geyer (1994), 
Roberts and Rosenthal (1997, 2004), Jones (2004)], but then it is usually 
very difficult to verify that a CLT exists for a given functional of the 
Markov chain. Thus geometric ergodicity is a very desirable property for 
a Markov chain to have. This is especially true because most instances of 
the Metropolis-Hastings-Green algorithm are reversible or can be made to 
be reversible [Geyer (2011), Sections 1.5, 1.12 and 1.17], so, as stated above, 
geometric ergodicity implies the CLT holds for all L 2 functionals of the 
Markov chain, which makes reversible geometrically ergodic MCMC just as 
good as OMC in this respect. 

Geometric ergodicity also plays a key role in the theory of calculable 
nonasymptotic bounds for Markov chain estimators [Rosenthal (1995b), 
Latuszyhski and Niemiro (2011), Latuszyhski, Miasojedow and Niemiro 
(2012)], but is only half of what must be done to establish this type of 
result. The other half is establishing a minorization condition. The proof 
techniques involved in establishing geometric ergodicity and in establishing 
minorization conditions, however, have little in common. We deal only with 
establishing geometric ergodicity. 

1.1. The random-walk Metropolis algorithm. The Metropolis-Hastings- 
Green algorithm generates a Markov chain having a specified invariant prob- 
ability distribution. We restrict our attention to distributions of continuous 
random vectors, those having a density tt with respect to Lebesgue measure 
on If tt is only known up to a normalizing constant, then the Metropolis- 
Hastings-Green algorithm still works. 

We describe only the random-walk Metropolis algorithm [terminology in- 
troduced by Tierney (1994)]. This simulates a Markov chain X\,X2, ■ ■ ■ hav- 
ing tt as an invariant distribution. It is determined by tt and another function 
q : M. k — > M that is a properly normalized probability density with respect to 
Lebesgue measure on R fc and is symmetric about zero. Each iteration does 
the following three steps, where X n is the state of the Markov chain before 
the iteration and X n+ \ is the state after the iteration. Simulate Z n having 
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the distribution q, and set Y n = X n + Z n . Calculate 

(1) a(X n ,Y n )=mm(l,TT(Y n )/7r(X n )). 

Set X n+ i = Y n with probability a(X n ,Y n ), and set X n+ \ = X n with proba- 
bility 1 - a(X n , Y n ). 

The only requirement is tt(X\) > 0. The operation of the algorithm itself 
then ensures that ir(X n ) > almost surely for all n, so (1) always makes 
sense. 

The proposal density q and target density ir are arbitrary. The algorithm 
always produces a (not necessarily ergodic) reversible Markov chain having 
invariant density ir regardless of what q is chosen. If q is everywhere positive, 
then the Markov chain is necessarily ergodic [irreducible and positive Harris 
recurrent, Tierney (1994), Corollary 2]. 

The R package mcmc [Geyer and Johnson (2012)] provides a user-friendly 
implementation of the random-walk Metropolis algorithm combined with 
the variable transformation methodology described in this article in its 
morph.metrop function. The user provides an R function that evaluates 
log7r, and the metrop function in that package does the simulation. If the 
user correctly codes the function that evaluates log w, then the morph . metrop 
function is guaranteed to simulate a reversible ergodic Markov chain hav- 
ing invariant density tt. This gives an algorithm having an enormous range 
of application, which includes all Bayesian inference for models with con- 
tinuous parameters and continuous prior distributions. No other computer 
package known to us combines this range of application with the correctness 
guarantees of the mcmc package, which are as strong as can be made about 
arbitrary user-specified target distributions. 

1.2. Geometric ergodicity and random-walk Metropolis. A random- walk 
Metropolis sampler is not necessarily geometrically ergodic, but its geomet- 
ric ergodicity has received more attention [Mengersen and Tweedie (1996), 
Roberts and Tweedie (1996), Jarner and Hansen (2000)] than any other 
MCMC sampler, except perhaps independence Metropolis-Hastings sam- 
plers, also terminology introduced by Tierney (1994), which are also studied 
in Mengersen and Tweedie (1996) and Roberts and Tweedie (1996). Indepen- 
dence Metropolis-Hastings samplers, however, do not have good properties, 
being either uniformly ergodic or not geometrically ergodic and uniformly 
ergodic only when its proposal distribution is particularly adapted to tt in 
a way that is difficult to achieve (whenever independence samplers work, 
importance sampling also works, so MCMC is unnecessary). 

To simplify the theory, Mengersen and Tweedie (1996), Roberts and 
Tweedie (1996) and Jarner and Hansen (2000) restrict attention to tt that 
are strictly positive and continuously differentiable. In order to build on their 
results, we also adopt this restriction. The geometric ergodicity properties 
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of the random-walk Metropolis algorithm are related to 

x 

(2) limsup-j — | • V log7r(x), 



where the dot indicates inner product, and | • | denotes the Euclidean norm. 
We say tt is super- exponentially light if (2) is — oo, is exponentially light if 
(2) is negative and sub- exponentially light if (2) is zero. 

None of these conditions are necessary for geometric ergodicity. A nec- 
essary condition for the geometric ergodicity of a random-walk Metropolis 
algorithm is that the target density tt have a moment generating function 
[Jarner and Tweedie (2003)]. It is possible for a density to have a moment 
generating function but not be even sub-exponentially light, for example, 
the unnormalized density 



Following Roberts and Tweedie (1996) and Jarner and Hansen (2000), we 
also restrict attention to q that are bounded away from zero in a neighbor- 
hood of zero. This includes the normal proposal distributions used by the R 
package mcmc. 

Theorem 1 [Jarner and Hansen (2000), Theorem 4.3]. Suppose tt is a 
super- exponentially light density on IR fc that also satisfies 



where the dot denotes inner product; then the random-walk Metropolis algo- 
rithm with q bounded away from zero on a neighborhood of zero is geomet- 
rically ergodic. 

We say tt satisfies the curvature condition to mean (3) holds. This means 
the contours of tt are approximately locally linear near infinity. 

Theorem 1, although useful, covers neither exponentially light densities, 
which arise in Bayesian categorical data analysis with canonical parameters 
and conjugate priors (Section 3.1), nor sub-exponentially light densities, 
which arise in Bayesian analysis of Cauchy location models using flat im- 
proper priors on the location parameters (Section 3.4). Roberts and Tweedie 
(1996) do cover exponentially light densities, but their theorems are very dif- 
ficult to apply [Jarner and Hansen (2000) show that Roberts and Tweedie 
(1996) incorrectly applied their own theorem in one case]. 

The key idea of this paper is to use the change-of-variable theorem in 
conjunction with Theorem 1 to get results that Theorem 1 does not give 
directly. Suppose np is the (possibly multivariate) target density of interest. 
We instead simulate a Markov chain having invariant density 



7r(x) = e" |a:| (l + cos(x)) 



xeR. 



(3) 




<0 



(4) 



^ 7 ( 7 ) = ^(/i( 7 ))|detVM7)l 
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where h is a diffeomorphism. If irp is the density of the random vector f3, 
then n-y is the density of the random vector 7 = h~ l (j3). We find conditions 
on the transformation h that make 7r 7 super-exponentially light and satisfy 
the curvature condition. Then by Theorem 1, the simulated Markov chain 
71, 72, ... is geometrically ergodic. It is easy to see (Appendix A) that the 
Markov chain f3i = h(-fi), z = 1,2,..., is also geometrically ergodic. Thus 
we achieve geometric ergodicity indirectly, doing a change-of- variable yield- 
ing a density that by Theorem 1 has a geometrically ergodic random-walk 
Metropolis sampler, sampling that distribution, and then using the inverse 
change-of- variable to get back to the variable of interest. 

This indirect procedure has no virtues other than that Metropolis random- 
walk samplers are well-understood and user-friendly and that we have The- 
orem 1 to build on. There is other literature using drift conditions to prove 
geometric ergodicity of Markov chain samplers [Geyer and M0ller (1994), 
Rosenthal (1995a), Hobert and Geyer (1998), Jones and Hobert (2004), Roy 
and Hobert (2007), Tan and Hobert (2009), Johnson and Jones (2010)] but 
for Gibbs samplers or other samplers for specific statistical models, hence not 
having the wide applicability of random-walk Metropolis samplers. There is 
also other literature about using variable transformation to improve the 
convergence properties of Markov chain samplers [Roberts and Sahu (1997), 
Papaspiliopoulos, Roberts and Skold (2007), Papaspiliopoulos and Roberts 
(2008)] but for Gibbs samplers not having the wide applicability of random- 
walk Metropolis samplers. 

It is important to understand that the necessary condition mentioned 
above [Jarner and Tweedie (2003)] places a limit on what can be done 
without variable transformation. If 7rg does not have a moment generating 
function (any Student t distribution, e.g.), then no random-walk Metropolis 
sampler for it can be geometrically ergodic (no matter what proposal distri- 
bution is used). Thus if we use a random- walk Metropolis sampler, then we 
must also use variable transformation to obtain geometric ergodicity. 

We call a function h : M. k — >• M. k isotropic if it has the form 

,~ v J7(l7l)rr, 

(5) h(j) = < M 

lo, 7 = 

for some function /:(0,oo) — > (0, 00). To simplify the theory, we restrict 
attention to h that are isotropic diffeomorphisms, meaning h and /i" 1 are 
both continuously differentiable, having the further property that det(V/i) 
and det(V/i _1 ) are also continuously differentiable. 

As with the restriction to ir that are strictly positive and continuously 
differentiable used by Mengersen and Tweedie (1996), Roberts and Tweedie 
(1996) and Jarner and Hansen (2000), this restriction is arbitrary. It is not 
necessary to achieve geometric ergodicity; it merely simplifies proofs. How- 
ever, the proofs are already very complicated even with these two restric- 
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tions. Although both these restrictions could be relaxed, that would make 
the proofs even more complicated. Since many applications can be fit into 
our framework, perhaps after a change-of-variable to yield irp that is strictly 
positive and continuously differ entiable, we choose to not complicate our 
proofs further. 

Isotropic transformations (5) shrink toward or expand away from the 
origin of the state space. In practice, they should be combined with trans- 
lations so they can shrink toward or expand away from arbitrary points. 
Since translations induce isomorphic Markov chains (Appendix A), they do 
not affect the geometric ergodicity properties of random-walk Metropolis 
samplers. Hence we ignore them until Section 4. 

Our variable-transformation method is easily implemented using the R 
package mcmc [Geyer and Johnson (2012)] because that package simulates 
Markov chains having equilibrium density tt specified by a user- written func- 
tion, which can incorporate a variable transformation, and outputs an ar- 
bitrary functional of the Markov chain specified by another user-written 
function, which can incorporate the inverse transformation. 

A referee pointed out that one can think of our transformation method 
differently: as describing a Metropolis-Hastings algorithm in the original 
parameterization. This seems to avoid variable transformation but does not, 
because its proposals have the form h(h~ l (/3) +z), where /3 is the current 
state, and z is a simulation from the Metropolis q. This uses h and h~ l 
in every iteration, whereas the scheme we describe uses only h to run the 
Markov chain for 7 and to map it back to j3, needing h~ l only once to 
determine the inital state 71 = /i _1 (/3i) of the Markov chain. Nevertheless, it 
is of some theoretical interest that this provides hitherto unnoticed examples 
of geometrically ergodic Metropolis-Hastings algorithms. 

2. Variable transformation. 

2.1. Positivity and continuous differentiability. For the change-of-variable 
(4) we need to know when the transformed density 7r 7 is positive and contin- 
uously differentiable assuming the original density has these properties. 
If h is a diffeomorphism, then the first term on the right-hand side will be 
continuously differentiable by the chain rule. Since V/1 -1 is the matrix in- 
verse of V/i by the inverse function theorem, det(V/i) can never be zero. 
Hence h being a diffeomorphism is enough to imply positivity of 7r 7 . 

Since det(^4) is continuous in A, being a polynomial function of the com- 
ponents of A, det(V/i) can never change sign. We restrict attention to h such 
that det(V/i) is always positive, so the absolute value in (4) is unnecessary. 
Then we have 

(6) log 7^(7) = log 7173(^(7)) +logdet(V7i(7)), 

(7) Vlog7r 7 ( 7 ) = V(log^)(/i(7))V/ i (7) + Vlogdet(V/ l (7)). 
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It is clear from (7) that log7r 7 , and hence 7r 7 is continuously differentiable if 
h is a diffeomorphism, and det(Vfo) is continuously differentiable. 

2.2. Isotropic functions. In the transformation method, the induced den- 
sity, 7r 7 will need to satisfy the smoothness conditions of Theorem 1. We 
require the original density, irp to satisfy the smoothness conditions of The- 
orem 1. The smoothness conditions will be satisfied for 7r 7 if the isotropic 
transformations are diffeomorphisms with continuously differentiable Ja- 
cobians. The assumptions of the following lemma provide conditions on 
isotropic functions to guarantee that 7r 7 is positive and continuously dif- 
ferentiable whenever irg is. 

Lemma 1. Let h:M. k — > R fc be an isotropic function given by (5) with 
f : [0, oo) — > [0, oo) invertible and continuously differentiable with one-sided 
derivative at zero such that 



(8) 
Then 

(9) 



f'(s)>0, 

h(rr) 



s>0. 



7/0, 



f is a diffeomorphism, h is a diffeomorphism and 

P 

(io) h-\p) 



rH\p\h 

10, 



p = o 



and 
(11) 



V/»( 7 ) 



f(\l\)h 
l7l 



+ 



/'(M) 



f(h\) 
M 



77 
|7| 5 



7/0, 



where 1^ is the k x k identity matrix, and 
(12) Vfc(0) = /'(0)I fc . 

Moreover 



(13) 



det(V/i( 7 )) 



{ f'(o) k , 



7/0, 

7 = 



and, under the additional assumption that f is twice continuously differen- 
tiable with one-sided derivatives at zero and 



(14) f(0) 
(13) is continuously differentiable. 



0. 



The proof of this lemma is in Appendix B. 
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2.3. Inducing lighter tails. Define / : [0, oo) — > [0, oo) by 

x, x < R, 

x + (x- R)p, x>R, 



(15) f(x) 



where R > and p > 2. It is clear that (15) satisfies the assumptions of 
Lemma 1. 

Theorem 2. Let 7rg be an exponentially light density on and let h 
be defined by (5) and (15). Then 7r 7 defined by (4) is super- exponentially 
light. 

Proof of Theorem 2 is in Appendix C. 
Now define / : [0, oo) — > [0, oo) by 



(16) f(x) 



( bx e 1 

6 "3' X> V 
3 b 3 e be 1 

X ~~ I X • X , - 



6 2' ~ b' 

where b > 0. It is clear that (16) satisfies the assumptions of Lemma 1. 

Theorem 3. Let irp be a sub- exponentially light density on M. k , and 
suppose there exist a> k and R < oo such that 

(17) ^■Vlog7r0O8)<~ |/3| > R. 

Let h be defined by (5) and (16). Then 7r 7 defined by (4) is exponentially 
light. 

Proof of Theorem 3 is in Appendix C. 

Condition (17) is close to sharp. For example, if tvr looks like a multivari- 
ate t distribution 

(18) Mt) = ^ + (t-^ l {t-^T {V+k) ' 2 

[compare with (27) in Section 3.3], then (17) holds with a = k + v, and (18) 
is integrable if and only if v > 0. 

Moreover, an exponential- type isotropic transformation like (16) is nec- 
essary to obtain a super-exponentially light 7r 7 when irp is a multivariate 
t distribution. Direct calculation shows that no polynomial-type isotropic 
transformation like (15) does the job. 

Corollary 1. Let irp satisfy the conditions of Theorem 3, and let h be 
defined as the composition of those used in Theorems 2 and 3; that is, if we 
denote the h used in Theorem 2 by hi and denote the h used in Theorem 3 
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by h,2, then in this corollary we are using h = hi o hi and the change of 
variable is 7 = h^ 1 (h^ 1 (ft)) ■ Then 7r 7 defined by (4) is super- exponentially 
light. 

Proof. The proof follows directly from Theorems 2 and 3. □ 

2.4. Curvature conditions. As seen in Jarner and Hansen (2000), Ex- 
ample 5.4, being super-exponentially light is not a sufficient condition for 
the geometric ergodicity of a random- walk Metropolis algorithm. Jarner 
and Hansen (2000) provide sufficient conditions for super-exponentially light 
densities. In this section, we provide sufficient conditions for sub-exponentially 
light and exponentially light densities, such that, using the transformations 
from Section 2.3 the induced super-exponential densities will satisfy the 
Jarner and Hansen (2000) sufficient conditions. 

Theorem 4. Let 773 be an exponentially light density on M. k , and suppose 
that tt/3 satisfies either of the following conditions: 

(i) 773 satisfies the curvature condition (3), or 

(ii) I V log 71^3 I is bounded as \(5\ goes to infinity. 

Let h be defined by (5) and (15). Then 77, defined by (4) satisfies the cur- 
vature condition (3). 

Proof of Theorem 4 is in Appendix D. 

For exponentially light 773, condition (ii) implies condition (i). In practice, 
condition (ii) may be easier to check than condition (i) (as in Section 3.1). 

Theorem 5. Let 773 be a sub- exponentially light density on M. k , and 
suppose there exist a> k and R < 00 such that 

(19) |Vlog^(/3)|<p, \(3\>R. 

Let h be defined by (5) and (16). Then 77, defined by (4) satisfies condition 
(ii) of Theorem 4 with ft replaced by 7. 

Proof of Theorem 5 is in Appendix D. 

Condition (19), like (17), is close to sharp. If 773 has the form (18), then 
(19) holds with a = k + v, and (18) is integrable if and only if v > 0. 

Corollary 2. Let irp satisfy the conditions of Theorems 3 and 5, and 
let h be defined as the composition of those used in Theorems 4 and 5, that 
is, if we denote the h used in Theorem 4 by h\ and denote the h used in 
Theorem 5 by h<i, then in this corollary we are using h = hioh\ and the 
change of variable is j = h^ 1 (h^ 1 (ft)) ■ Then 7r 7 defined by (4) satisfies the 
curvature condition (3). 
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Proof. This follows directly from Theorems 5 and 4. □ 

To verify that a variable transformation (5) produces geometric ergodicity, 
one uses Theorems 2 and 4 when the given target density ttr is exponentially 
light. To verify that a variable transformation (5) produces geometric ergod- 
icity, one uses Corollaries 1 and 2 when the given target density 7rg is sub- 
exponentially light. (When the given target density irp is super-exponentially 
light one does not need variable transformation to obtain geometric ergod- 
icity if tt/3 also satisfies the curvature condition.) 

3. Examples. 

3.1. Exponential families and conjugate priors. In this section we study 
Bayesian inference for exponential families using conjugate priors, in par- 
ticular, the case where the natural statistic is bounded in some direction, 
and the natural parameter space is all of R fc . Examples include logistic re- 
gression, Poisson regression with log link function and log-linear models in 
categorical data analysis. In this case, we find that the posterior density, 
when it exists, is exponentially light and satisfies the curvature condition. 
Hence variable transformation using (5) and (15) makes the random-walk 
Metropolis sampler geometrically ergodic. 

An exponential family is a statistical model having log likelihood of the 
form 

where the dot denotes inner product, y is a vector statistic, (3 is a vector 
parameter and the function c is called the cumulant function of the family. A 
statistic y and parameter j3 that give a log likelihood of this form are called 
natural or canonical. If yi, . . . , y n are independent and identically distributed 
observations from the family and y n their average, then the log likelihood 
for the sample of size n is 

ny n - (3 -nc((3). 

The log unnormalized posterior when using conjugate priors is 

(20) w(p) = (ny n + vr i )-p-(n + i/)c(p), 

where v is a scalar hyperparameter, and rj is a vector hyperparameter [Dia- 
conis and Ylvisaker (1979), Section 2]. When simulating the posterior using 
MCMC, the unnormalized density of the target distribution is ir(/3) = e w ^'. 

The convex support of an exponential family is the smallest closed convex 
set containing the natural statistic with probability one. (This does not de- 
pend on which distribution in the exponential family we use because they are 
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all mutually absolutely continuous.) Theorem 1 in Diaconis and Ylvisaker 
(1979) says that the posterior exists; that is, e w ^ is integrable, where w(/3) 
is given by (20), if and only if n + v > and (ny n + vr\) /(n + u) is an interior 
point of the convex support. (Of course, this always happens when using 
a proper prior, i.e., when v > and rj/v is an interior point of the convex 
support.) 

Theorem 9.13 in Barndorff-Nielsen (1978) says that this same condition 
holds if and only if the log unnormalized posterior (20) achieves its maximum 
at a unique point, the posterior mode, call it j3 n . (Ostensibly, this theorem 
applies only to log likelihoods of exponential families not to log unnormalized 
posteriors with conjugate priors, but since the latter have the same algebraic 
form as the former, it actually does apply to the latter.) 

From the properties of exponential families [Barndorff-Nielsen (1978), 
Theorem 8.1], 

(21) Vc(P)=Ef,(Y). 
It follows that 

(22) V]ogirtf) = Vw(p) = ny n + vri-(n + v)Ep(Y). 

Suppose that the natural statistic is bounded in some direction, that is, 
there exists a nonzero vector 5 and real number b such that y ■ 5 <b for all 
y in the convex support. It follows that Ep(Y) ■ 5 <b. Then 

limsup-^7 • Vlog7r(/3) >limsup-^- • [ny n + vq- (n + v)E sS (Y)) 

\P\-HX> \P\ S^OO \so\ 

(ny n + vr))-5-(n + v)b 

Hence (2) is not — oo and the target distribution is not super-exponentially 
light. 

When the convex support has nonempty interior, the cumulant function 
c is strictly convex [Barndorff-Nielsen (1978), Theorem 7.1]. Hence (20) is 
a strictly concave function. It follows from this that Vc is a strictly multi- 
variate monotone function, that is, 

(23) [Vc(/3i)-Vc(/3 2 )]-(/3 1 -/3 2 )>0, ft^ft 
[Rockafellar and Wets (1998), Theorem 2.14 and Chapter 12]. It follows that 

(24) V ^'^ff7 <0 ' 

\P Pn\ 

where w is given by (20), because Vw{(3 n ) = 0. Let B denote the boundary 
and E denote the exterior of the ball of unit radius centered at f3 n . Since 
c is infinitely differentiable [Barndorff-Nielsen (1978), Theorem 7.2], so is 
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w, and the left-hand side of (24) is a continuous function of f3. Since B is 
compact, the left-hand side of (24) achieves its maximum over B, which 
must be negative, say —e. For any f3 € E we have t/3 + (1 — t)(3 n £ B when 
t=l/\P-p n \. By (23) we have 

[VM/3) - Vw(t/3 + (1 - t)~p n )] ■ < 

\P Pn\ 

because 

/3-[i/3 + (l-t)/3 n ] = (l-t)(/?-/3 n ) 
is parallel to (3 — (3 n . Thus 

Vw(P) ■ ^"fo <-e, P£E 



and 



1/3-/3, 



limsupVw(/3) • — — < —e, 



\P-Pn 

and this is easily seen to be equivalent to the unnormalized density (20) 
being exponentially light. 

Now we check the curvature condition (3) for exponential families. In case 
the natural statistic is bounded in all directions, as in logistic regression and 
log-linear models, the curvature condition follows directly because the family 
satisfies condition (ii) of Theorem 4 because Vlog7r(/3) is (22), and this is 
bounded. In case the natural statistic is bounded in some directions but 
not all directions, as in Poisson regression, we have to work harder and use 
condition (i) of Theorem 4. Because 

VttGS) 



we have 



Vtt(/3) Vw(P) 



|Vvr(/3)| \Vw(P)\ 

where Vw(/3) is given by (22). And from (24) and Vu>(/3) ^ for /3 ^ $ n , we 
obtain 

(25) ^|).1^<„, ^ 

VJw{P)\ \P-Pn\ 

and the rest of the proof that ir satisfies the curvature condition is just like 
the proof that it is exponentially light given above except that (25) replaces 
(24). 



3.2. Multinomial logit regresion with a conjugate prior. This example is 
a special case of the example in Section 3.1. 
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In multinomial logit regression, using a conjugate prior is equivalent to 
adding prior counts to the data cells. For observations 1,...,L, represent 
these prior counts as where is a vector giving the prior probability for 
each response for the Ith. observation, and v\ is the prior sample size. For 
the Ith. observation, let the vector Y l represent the counts in each response 
category, N l = Y\ be the sample size and M l be the model matrix. The 
log unnormalized posterior density for the regression parameter (3 is given 
by 

(26) ttO%, n, £, v) oc exp|^(y z +£V) • M l (3 - (n l + u l ) log ^ ^ j , 

where Mj. is the jth row of the matrix M l . So long as yl + £\v l is positive 
for all i and I — there is data (actual plus prior) in all cells — ir will be expo- 
nentially light, and satisfy condition (3). Hence a random- walk Metropolis 
algorithm for the density induced by the approach in Theorems 2 and 4 will 
be geometrically ergodic. 

3.3. Multivariate T distributions. The density of a multivariate t distri- 
bution on M fc with v degrees of freedom, location parameter vector n and 
scale parameter matrix £ is given by 

-{v+k)/2 



(27) 7Tg (t) 



T[(v + k)/2] 



r[v/2](vTr) k / 2 det(S)V2 



1 



l + -(t-//) T E- 1 (t-/x) 



v 



so 



. . -(v + k)^- 1 ^- u) 

(28) Vlogvr^)- 



v + (t-fi) T ^(t-^y 
which implies 

(29) t ■ Vlog7ra(i) — > —(v + k), asi->oo, 

so (27) is sub-exponentially light. 

The condition of Theorem 3 is also implied by (29). To check the condition 
of Theorem 5 we calculate 

IV71 „ (+\\ 2 S (^ + fc ) 2A maxl*-H 2 

|vlog ^ (t)l - {x-^Wf ' 

where A max and A m i n are the largest and smallest eigenvalues of E _1 . Hence 

Ivri AAI ^ \ V + ^Mmax 

|Vlog7r^(t)| < rr— -j, 

and the condition of Theorem 5 also holds. So a random- walk Metropolis al- 
gorithm for the induced density 7r 7 that uses the transformation described in 
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Corollaries 1 and 2 will be geometrically ergodic, and the inverse transformed 
Markov chain will be geometrically ergodic for it p. Since the multivariate t 
distribution does not have a moment generating function, no random- walk 
Metropolis algorithm for irp is geometrically ergodic [Jarner and Tweedie 
(2003)]. Variable transformation is essential. 

The case k = 1 gives the univariate t distribution, which has been widely 
used as an example of a Harris ergodic random-walk Metropolis algorithm 
that is not geometrically ergodic [Mengersen and Tweedie (1996), Jarner 
and Hansen (2000), Jarner and Tweedie (2003), Jarner and Roberts (2007)]. 

3.4. Cauchy location models and flat priors. The t distribution with one 
degree of freedom is the Cauchy distribution. Consider a Cauchy location 
family with flat prior, so the posterior density for sample size one is again a 
Cauchy distribution 

Kpip) = ~ ■ -7—, «i 

7T 1 + (X — \lY 

and, this being a special case of the preceding section, this density is sub- 
exponentially light. 

For a sample of size n the unnormalized posterior density is 

n 1 
XX l + { Xi - n) 2 

and the posterior distribution is no longer a brand name distribution. It is 
still easily shown to be sub-exponentially light and to satisfy the conditions 
of Theorems 3 and 5. 

4. Discussion. The transformations in Theorems 2 and 3 will always in- 
duce a density with tails at least as light as the original density. If the original 
density satisfies the curvature condition, then the transformation using the 
transformation from Theorem 2 will induce a density that satisfies the cur- 
vature condition. Thus applying the transformation from Theorem 2 to a 
super-exponentially light density that satisfies the curvature condition will 
induce another super-exponentially light density that satisfies the curvature 
condition. We do not recommend transformation when the original density 
already satisfies the conditions of Theorem 1, but it seems this will do no 
harm. 

The transformation method introduced here can be mixed blessing. It can 
produce geometric ergodicity, but may cause other problems. For example, 
7r 7 given by (4) can be multimodal when is unimodal. Thus we want 
a less extreme member of the family of transformations that does the job. 
The idea is to pull in the tails enough to get geometric ergodicity without 
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much affecting the main part of the distribution. Although very extreme 
transformations work in theory, they are problematic in practice due to 
inexactness of computer arithmetic. 

As mentioned in the Introduction, in practice one combines the trans- 
formations introduced in Section 2.3 with translations. Let t\ denote the 
translation x i— > x + A. Then in the exponentially light irp case, we use the 
transformation h = t\o hn jP , where hn iP is the h defined by (5) and (15), so 
the change-of- variable is 7 = (fi — A). This gives users three adjustable 
constants, A, R and p, to experiment with to improve the mixing of the 
sampler. If 773 satisfies the assumptions of Theorems 2 and 4, then any valid 
values of A, R and p result in a geometrically ergodic sampler. Observe that 
the restriction of this h to the ball of radius R centered at A is a transla- 
tion, which does not affect the shape of the distribution. Thus one wants 
to choose A near the center of the distribution (perhaps the mode of 773, if 
it has one) and R large enough so that a large part of the probability is in 
this ball where the shape is unchanged. The parameter p should always be 
chosen to be small, say 3 or 2.5 (recall p > 2 is required), 3 is a good choice 
as then / has a closed-form expression for its inverse. 

In the sub-exponentially light irp case, we use the transformation h = 
t\ o hb o hji )P , where hf, is the h defined by (5) and (16), and the other two 
transformations are as above, so the change-of-variable is 7 = h~^ p {h~^ l {f5 — 
A)). This gives users four adjustable constants, A, R, p and b to experiment 
with to improve the mixing of the sampler. If 773 satisfies the assumptions 
of Corollaries 1 and 2, then any valid values of A, R, p and b result in a 
geometrically ergodic sampler. One should choose the first three as discussed 
above, and b should be chosen to be small, say 0.1 or 0.01. 

Admittedly, our methods do not guarantee geometric ergodicity without 
any theoretical analysis. Users must understand the tail behavior of the tar- 
get distribution in order to select the correct transformation. For distribu- 
tions with well behaved tails, this analysis may be easy, as in our examples. 
We can say that our methods are no more difficult to apply than the current 
state of the art [Jarner and Hansen (2000)] and are applicable to a much 
larger class of models. 

APPENDIX A: ISOMORPHIC MARKOV CHAINS 

We say measurable spaces are isomorphic if there is an invertible bimea- 
surable mapping between them (h bimeasurable means both h and h~ l 
are measurable). We say probability spaces (S, A, P) and (T,B,Q) are iso- 
morphic if there is an invertible bimeasurable mapping h: S — > T such that 
P = Q o h, meaning 

P{A) = Q{h(A)), AG A, 
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which also implies Q = Poh~ l . We say Markov chains on state spaces (S, A) 
and (T,B) are isomorphic if there is an invertible bimeasurable mapping 
h:S—>T such that the corresponding initial distributions fj, and v and the 
transition probability kernels P and Q satisfy \i = v o h and 

(30) P(x,A) = Q{h(x),h(A)), xeS and AE A. 
By the change-of-variable theorem for measures, (30) implies 

(31) P n (x,A) = Q n (h{x),h(A)), neN&ndxeSandAeA. 

It follows that P has an irreducibility measure if and only if Q has an 
irreducibility measure. It also follows from the change-of-variable theorem 
that T] is an invariant measure for P if and only if r\ o h~ l is an invariant 
measure for Q. Thus P is null recurrent if and only if Q is, and P is positive 
recurrent if and only if Q is. Also P is reversible with respect to rj if and 
only if Q is reversible with respect to r] o h . 

For Harris recurrence we use the criterion that a recurrent Markov chain is 
Harris if and only if every bounded harmonic function is constant [Nummelin 
(1984), Theorem 3.8 combined with his Proposition 3.9 and Theorem 8.0.1 
of Meyn and Tweedie (2009)]. A function g is harmonic for a kernel P if 
g = Pg, meaning 

g(x) = J P(x,dy)g(y), x e S. 

It is clear that g is harmonic for P if and only if g o h^ 1 is harmonic for Q. 
Thus P is Harris recurrent if and only if Q is. 

Suppose P is irreducible and periodic. This means [Meyn and Tweedie 
(2009), Proposition 5.4.1] there are disjoint sets Dq, Dd-\ with d> 2 
that are a partition of S such that 

P(x, D i+ i m od d) = 1, x G A, i = 0, . • . , d - 1. 

But then 

Q(y,h~ 1 (D i+1 mod d )) = 1, yeh~ 1 (D i ) 1 i = 0,...,d-l, 

and the sets h~ l (Di) partition T, so Q is also periodic. Thus isomorphic 
irreducible Markov chains are both periodic or both aperiodic. 

Finally suppose tt is an invariant probability measure for P, and [i is any 
probability measure on the state space. Then ip = it o h~ l is an invariant 
probability measure for Q, and it is clear that 

11^-^11= ll^-KTH, n € N, 

where || • || denotes total variation norm and i/ = /io h^ 1 . A Markov chain 
is geometrically ergodic if there exists a nonnegative-real-valued function M 
and constant r < 1 such that 

(32) \\P n (x, •) - vr(-)|| < M(x)r n , for all x 
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[Meyn and Tweedie (2009), Chapter 15]. If M is bounded, then the Markov 
chain is uniformly ergodic [Meyn and Tweedie (2009), Chapter 16]. If (32) 
holds with r n replaced by n r for some r < 0, then the Markov chain is 
polynomially ergodic [Jarner and Roberts (2002)]. Thus, if a Markov chain 
is polynomially ergodic, geometrically ergodic, or uniformly ergodic, then 
any isomorphic Markov chain has the same property. 

The following summarizes the discussion in this appendix. 



Theorem 6 (Isomorphic Markov chains) . If a Markov chain has one of 
the following properties, irreducibility, reversibility, null recurrence, positive 
recurrence, Harris recurrence, aperiodicity, polynomial ergodicity, geometric 
ergodicity, uniform ergodicity, then so does any isomorphic Markov chain. 



APPENDIX B: PROOF OF LEMMA 1 
That / is a diffeomorphism follows from the inverse function theorem 
df~\t) 1 



dt f'(s) 



whenever t = f(s) 



and (8). It is clear from (5) that |/i(7)| = /(M) for all 7, from which (9), 
(10) and the invertibility of h follow. 
Now for 7/0 we have 



d 



' d \ V 2 / d \ - 1 / 2 

£7* = £7? 7, 
U=i / \i=i / 



so 



and now (11) follows straightforwardly from (5), and it is clear that h is 
continuously differ entiable everywhere except perhaps at zero and similarly 
for h . 

The term in square brackets on the right-hand side of (11) goes to zero 
as I7I —> by the definition of derivative and that the term that multiplies 
it is bounded, thus, if we can show (12), then V/i is also continuous at zero. 
By the definition of derivative, what must be shown to prove (12) is that 

fr(7)-/'(0)7 /(l7l)(7/|7|)-/'(0)7 



M h\ 



J_ 
l7l 
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converges to zero as 7 — > 0. Since the term in square brackets converges to 
zero by the definition of derivative and 7/I7I is bounded, this proves (12). 
Since the formulas for h and h~ l have the same form, this shows h is a 
diffeomorphism. 

The determinant of a symmetric matrix is the product of its eigenvalues 
[Harville (1997), Theorem 21.6.1]. First, 7 is an eigenvector of V/t(7) with 
eigenvalue /'(I7I). Second, any vector v orthogonal to 7 is also an eigen- 
vector of V/i(7) with eigenvalue /(I7D/I7I when 7 7^ and eigenvalue f'(0) 
when 7 = 0. Since the subspace orthogonal to 7 has dimension k — 1, the 
multiplicity of the second kind of eigenvalue is k — 1. This proves (13). 

For we have 



(33) 



Vdet(VM7)) = /"(l7l)(4T^f 'fT 

V 7 / M 



+ (fc-l)/ / (l7l)('^) 



fe-2 



/'(H) Z(l7l)l7 T 



l7l 



17 1 



Since (13) depends on 7 only through I7I, it has circular contours, and we 
must have 



(34) 



Vdet(V/i(0)) = 



if the derivative exists. We claim the derivative (34) does exist, and (13) is 
continuously differentiable under the "additional assumptions" about second 
derivatives of / of the lemma. To prove this claim we need to first show that 
(33) converges to zero as 7 — > and second show that (34) is the derivative 
at zero. 

Except for the behavior of the term in square brackets, the limit of (33) is 
obvious from f(s)/s — > /'(0) as s — > and 7/I7I being bounded. For the term 
in square brackets we use Taylor's theorem [Stromberg (1981), Theorem 4.34] 

f(s) = cs + o(s 2 ), 

f'(s) = c + o(s), 



where c = /'(0), so 



/'(*) /(«) 



and the term in square brackets in (33) goes to zero as 7 
all of (33) goes to zero as 7 — > 0. 

What must be shown to establish (34) is that 



det(V/i(7)) - det(V/i(0)) 



1 



/'(l7l) 



fc-i 



proving that 
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converges to zero as 7 — > 0. Applying L'Hospital's rule, we have 
f(s)[f(s)/s] k - i -[f(0)] k 



lim : 

40 



lim 

sj.0 



f"(s) 



/(«) 



k-1 



+ f'(s)(k-l) 



~1 k-2 



/'(«) f(s) 



and we have already shown that the limit on the right-hand side is zero. 

APPENDIX C: PROOFS FROM SECTION 2.3 
Before we prove Theorem 2 we need two additional lemmas. 

Lemma 2. Let h be defined by (5) and (15). Then 



(35) 



lim • Vlogdet(V/i(7)) = 0, 



where the dot indicates inner product. 

Proof. Recalling the value of det(V/i(7)) for 7 7^ from (13) we can 
rewrite the dot product in (35) as 



(36) 



+ 



1) 



1 



fOM) 

/(|7l) l7l 



/'(l7l) 

From (15) for (7! > R we have 

(37) f'{x) =l+p{x-R) p ~ 1 , 

(38) f"(x)=p(p-l)(x-Ry- 2 

and, plugging these into (36), we see that, because p > 2, all terms in (36) 



go to zero like 



as |7| —¥ 00. □ 



Lemma 3. Under the assumptions of Lemma 1, 
(39) VM7)7 = /'(l7l)7, 7€M fc , 



(40) [VM7)]' 



/(l7l) 5 



/'(|7|)' 



/(l7l ) 
|7P 



21 



77 



V/i(7) being a symmetric matrix, and 



x T [Vh(j)] z x-- 



(41) 



[MY 

l7l 2 



\xr + 



2 7(l7l) 2 



7 7^0, 



/ h(j) ■ x 

V IM7)I 
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Proof. From (11) and (12), we straightforwardly obtain (39) and for 
7^0 



(42) 



and 



[V%)] 2 = VM7) 

/(H) 
It! 



|7| 

vh(rr) + 



f(l7l) /(M) 



T 



77 



I7r I7i 

| 7 |2 | 7 |3 



n\ hr 



f'(h\)f(h\) /(l7l) 



21 



171 



h\ 4 



77 



T 

77 , 



which plugged into (42) gives (40), and (41) is straightforward from (40). 
□ 

Proof of Theorem 2. Since Vh(j) is a symmetric matrix, it follows 
from (7) that 

7 • Vlog7r 7 (7) = V/i(7)7 • log7T/3(/i(7)) + 7 • Vlogdet(V/i(7)). 
Hence we can bound (2) by the sum of 

V%) 7 



(43) 

and 
(44) 



lim sup ■ 



Vlog7T /3 (/l(7)) 



7 



lim sup i — r • Vlogdet(V/i(7)). 

| -y | — >oo 171 



It follows from (9) and (39) that for large I7I the dot product in (43) can be 
rewritten as 



(45) 



/'(l7l)||^| -Vlogvr^^)). 



Since /'(|7|) is always positive, and is exponentially light, there is an e > 
such that (45) is bounded above by — /{(|7|)e- It is clear that /'(|7|) — > 00 
as I7I — > 00, so (43) is equal to —00. It follows from Lemma 2 that (44) is 
equal to zero, so (2) is equal to —00 and 7r 7 is a super-exponentially light 
density. □ 

Before we prove Theorem 3 we need a lemma. 
Lemma 4. Let h be defined by (5) and (16). Then 



(46) 



7 

lim sup 1 — 1 • Vlogdet(V/i(7)) = bk, 

|t|— >ao 171 



where the dot indicates inner product. 



MORPHISMS FOR GEOMETRIC ERGODICITY 



21 



Proof. As in in the proof of Lemma 2, the dot product in (46) can 
be written as (36). Clearly, (k — 1 ) / 1 T | goes to zero as | — y | goes to infinity. 
Hence, (46) is equal to 



if the limit exists. For x > 1/6, it follows from (16) that 

f'(x) = be bx , 
f( x ) = b 2 e bx 

and plugging these into (47) gives 

- b 2 e bx be bx 



lim sup 

x— >oo 



which equals bk. □ 



Proof of Theorem 3. As in the proof of Theorem 2, (2) can be rewrit- 
ten as the sum of (43) and (44), and for large |7| the dot product in (43) 
can be rewritten as (45). By (17) and the fact that |/i(7)| = /(M), (45) is 
bounded above 

r ( f(l7l) 
limsupl —a 



7 — ¥00 



/(N) 



which when / is given by (16) is equal to —ba. It follows that the limit 
superior in (2) is bounded above by — b{a — k). Since a > k, this upper 
bound is less than 0, so 7r 7 is exponentially light. □ 



APPENDIX D: PROOFS FROM SECTION 2.4 

Some lemmas are needed to prove the curvature conditions for exponen- 
tially light densities. 

Lemma 5. Let irp be an exponentially light density on R fc , and let h be 
defined by (5) and (15). Then 

(48) |Vlog7rg(fr(7))V/i(7)| -> 00 as \j\ ->• 00, 
and 7r 7 defined by (4) has the property 

(49) lim IV71 |V1 ?J 7 ^L M =l- 
l7l^oo|Vlogvr^(/i( 7 ))V/i(7)| 
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Proof. The square of the left-hand side of (48) is, by (41), 
)2 |Vlog^(M7))| 2 



(50) 



l7l 2 
+ 



A(7)-Vlog^(M7)) V 
I 1^(7)1 J 



M 5 

hence (48) holds if and only if (50) goes to infinity. Since the left-hand 
term of (50) is nonnegative, it is sufficient to show that the right-hand term 
goes to infinity to show that all of (50) goes to infinity. By assumption irp 
is exponentially light, and since |/i(7)| = /(M), there exists an e > and 
M < oo such that 

/i( 7 )-V log 7^(7)) 



< -e, 7 > M. 



Thus in order to prove (50) goes to infinity as I7I goes to infinity, it is 
sufficient to prove that the term in square brackets in (50) goes to infinity. 
Plugging in the definitions of / and /' from (15) and (37) for large x, we 
obtain 



42 - [!+,<,- AT']' -fr + <* -AT 

X 1 X 1 

= (p 2 -l)x 2p ~ 2 + o(x 2 ^ 2 ), 



and since p > 2 by assumption, this goes to infinity as x goes to infinity; 
hence (50) goes to infinity as | — ^ | goes to infinity and (48) holds. 
By (7), showing that (49) is true only requires showing that 

l7l^oo|Vlog^(/l(7))V/l( 7 )| 

It follows from (13) that for 7^0, 



logdet(V/i( 7 )) = log/'(|7|) + (k - l)log 
and 



l7l 



(52) Vlogdet(VM7))=(^y + (^ 



/(l7l) l7l 



T 



2_ 

l7l 



Plugging in the definitions of /, /' and /" from (15), (37) and (38) for large x, 
we see that f"(x)/ f'(x) and f'(x)/f(x) go to zero as x goes to infinity, and 
hence (52) goes to zero as | 7 | goes to infinity. Hence the numerator in (51) 
goes to zero. By (48) the denominator in (51) goes to infinity, and hence 
(51) holds. □ 
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Lemma 6. Let irp be an exponentially light density on R fc , and let h be 
defined by (5) and (15). Then 7r 7 defined by (4) has the property that 

(53) limcup 7 V ^ 7(7) 

(53) |W 7 ( 7 )| 

(which is the limit superior in the curvature condition) is bounded above by 

(54) limsup/' ( | 7 |a VWM7)) 



| 7 | |Vlog7r^(/i( 7 ))V/i(7)r 
where the dots in both equations denote inner products. 

Proof. We always assume that irp and 7r 7 are positive (Section 2.1), so 
we may take logs, obtaining 

Vlog7r 7 (7) = Vtt 7 (7) 
|Vlog7r 7 ( 7 )| ~ |V7r 7 ( 7 )r 

Thus (53) can be rewritten as 

Y _T Vlog7r 7 (7) |Vlog7r /3 (/t(7))V/i(7)| 

| 7 | \VlogTr p (h(i))Vh(i)\ |Vlog7r 7 (7)| 

and then we can use Lemma 5 as 

7 Vlogvr 7 (7) 
hm sup — • — — , ' — . - . . 

| 7 Koo |7| |Vlogvr^(/ l (7))V/i(7)| 

If we expand Vlog7r 7 (7) using (7), this is bounded above by the sum of 

(55) limsup 7 V1 °g^(M7))VM7) 
( J M-TW |Vlog^(/ i (7))V/ i (7)| 

and 

(56) hm sup ^ • V 1 °gdet(VM7)) 

( } |^Tl7l |Vlog^(M7))VM7)r 

It follows from Lemmas 2 and 5 that (56) is zero. Hence the limsup in (53) 
is bounded above by (55), which is equal to (54) since V/i( 7 ) is symmetric 
and V/i(7)7 = /'(l7l)7- □ 

Lemma 7. Let 0(7) and 6(7) be functions such that both a and b are 
positive and bounded away from zero and infinity as | 7 | goes to infinity. 
Then for f from (15), the fraction 



(57) /'(l7l) 2 /(^(7) + 



/'(|7|)' 



/(l7l) 
l7l 2 



6(7) 



is positive and bounded away from zero and infinity as I7I goes to infinity. 
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Proof. The reciprocal of (57) is 



/(H) 



2 



/'(l7l) 2 |7l 2 



1 



/(l7l) 5 



6(7). 



/'(I7l) 2 l7l 2 . 

Since 0(7) and 6(7) are both positive and bounded away from zero and 
infinity for large I7I, it is sufficient to show that 

(58) f{x)2 

is bounded away from zero and one for large x. For large x, it follows from 
(15) and (37) that (58) is equal to 

[x + (x - R)P} 2 
[l+p(x- R)P~ l Yx 2 ' 

which converges to 1/p as x — > 00. Since we assume p > 2, we are done. □ 

Proof of Theorem 4. First, assume that condition (i) holds. By 
Lemma 6, it is enough to show that (54) is less than zero, and (54) is equal 
to, using (9), 

( , q , y |Vlog^(/i(7))j/ , (|7l) M7) VlogTTflfofr)) 

1 j M-3 |Vlog7r /3 (/ l (7))V/ l ( 7 )| |M7)I ' iVlogT^Of/y))!' 

Since -irp satisfies condition (3), there is an e > such that (59) is bounded 
above by 

(60 , ^ MHf w 

| 7 |^oo |Vlogvr /3 (/i(7))V/i(7)| 

Because /'(|7|) is strictly positive, the fraction in (60) is strictly positive for 
large I7I, hence showing that this fraction's square is bounded away from 
zero is enough to show that (60) is less than zero, and condition (3) holds. 
Let 

iVlogvr^))! 2 _ 
l7J |Vlog^(/ l ( 7 ))| 2 

and 

/ VlQgTT^))-^) V 

{1) V|Vlog7r /3 (/ l (7))||M7)i; ' 

Then, using (41) as in deriving (50), the square of the fraction in (60) is 
equal to (57). The Cauchy-Schwarz inequality bounds 6(7) above by one, 
and condition (3) bounds 6(7) away from zero. So by Lemma 7 the square 
of the fraction in (60) is positive and bounded away from zero as |7| goes 
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to infinity. Because this fraction itself is positive, it must also be bounded 
away from zero as I7I goes to infinity Hence the limsup in (60) is negative 
and condition (3) holds for 7r 7 . 

Now assume that condition (ii) holds and irp is exponentially light, that 
is, there exist a (3q > 0, e > and M\ > Mi > such that for \f3\ > /3q, 

A.viog7r^(/3)<-£ 

and 

M 2 < |Vlog7r^(/3)| <Mi. 
It follows that 1/1 V log TTp(/3) \ > 1/M\ so itp satisfies condition (i). □ 

Proof of Theorem 5. By (7) and the triangle inequality, |Vlog7r 7 (7)| 
is bounded above by the sum 

(61) |V log 7^(7)) VM 7 )| + |Vlogdet(fc( 7 ))|. 

Hence it is sufficient to show that both of these terms are bounded as \y\ 
goes to infinity. 

It follows from (52) that the right-hand term in (61) is equal to 

( } /'(l7l) +( } /(l7l) ( } |7|- 

For large y, 

(63) m = e bu -\, 

(64) f'(y) = be b y, 

(65) f"( y ) = b 2 e b y. 
So (62) is equal to 

which clearly converges to bk as | 7 | goes to infinity, so the right-hand term 
in (61) is bounded for large |7|. 

It follows from (41) as in deriving (50) and from (9) that the square of 
the left-hand term in (61) is equal to the sum of 

(66) M^|Vlog^(M7))| 2 
and 



(67) f\h\y 



/(H) 



i ^( 7 )-Vlog^(%))^ 



l7l 2 /'(|7l) 2 



IM7)I 
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It follows from (63) and (64) that the term in square brackets of (67) is 
positive and less than one for large |7|. Since the other two terms in (67) 
are squares, (67) is nonnegative for large | — y | . Thus, applying the Cauchy- 
Schwarz inequality to the term in parentheses in (67), one bounds (67) above 
by 

(68) r(| 7 |) 2 !Vlog7r /3 (/ l (7))| 2 . 

By /(ItI) = IMt)| an d by (19), for \*y\ large (68) is bounded above by 

2 / / (N) 2 

/(M) 2 ' 

which converges to a 2 b 2 as | — ^ | goes to infinity, and that finishes the proof 
that (61) is bounded for large | —y | and the proof of the theorem. □ 
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